Contextual Decision Processes with low Bellman rank are PAC-Learnable
نویسندگان
چکیده
This paper studies systematic exploration for reinforcement learning with rich observations and function approximation. We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank , that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings. Our second contribution is a new reinforcement learning algorithm that engages in systematic exploration to learn contextual decision processes with low Bellman rank. Our algorithm provably learns near-optimal behavior with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for reinforcement learning with function approximation.
منابع مشابه
Rank-r Decision Trees are a Subclass of r-Decision Lists
Rivcst [5] defines the notion of a decision list as a representation for Boolean functions. He shows that k-decision lists, a generalization of k-CNF and k-DNF formulas, are learnable for constant k in the PAC (or distribution-free) learning model [&,3]. Ehrenfcucht and Haussler [l] define the notion of the rank of a decision tree, and prove that decision trees of constant rank are also learnab...
متن کاملSimple PAC Learning of Simple Decision Lists
We prove that log n-decision lists |the class of decision lists such that all their terms have low Kolmogorov complexity| are learnable in the simple PAC learning model. The proof is based on a transformation from an algorithm based on equivalence queries (found independently by Simon). Then we introduce the class of simple decision lists, and extend our algorithm to show that simple decision l...
متن کاملAPPLICATION OF THE BELLMAN AND ZADEH'S PRINCIPLE FOR IDENTIFYING THE FUZZY DECISION IN A NETWORK WITH INTERMEDIATE STORAGE
In most of the real-life applications we deal with the problem of transporting some special fruits, as banana, which has particular production and distribution processes. In this paper we restrict our attention to formulating and solving a new bi-criterion problem on a network in which in addition to minimizing the traversing costs, admissibility of the quality level of fruits is a main objecti...
متن کاملPredictive PAC Learning and Process Decompositions
We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictiv...
متن کاملPAC = PAExact and Other Equivalent Models in Learning
The Probably Almost Exact model (PAExact) BJT02] can be viewed as the Exact model relaxed so that 1. The counterexamples to equivalence queries are distributionally drawn rather than adversarially chosen. 2. The output hypothesis is equal to the target with negligible error (1=!(poly) for any poly). This model allows studying (Almost) Exact learn-ability of innnite classes and is in some sense ...
متن کامل